Genetic delimitation of Oreocharis species from Hainan Island

Abstract Hainan Island harbours an extraordinary diversity of Gesneriaceae with 14 genera and 23 species, amongst which two species and one variety are recognised in the genus Oreocharis. These three Oreocharis taxa are all Hainan-endemics and show a complex geographical distribution pattern with considerable morphological intermixtures. In this study, we combined DNA (nuclear ITS sequences and cpDNAtrnL-trnF and ycf1b) to evaluate genetic delimitation for 12 Oreocharis populations from the island, together with morphological similarity analysis using 16 morphological traits. The results showed Hainan Oreocharis taxa were monophyletic with relative low genetic diversity within populations, highly significant genetic differentiation amongst populations and a significant phylogeographical structure. The 12 populations formed three genetically distinct groups, roughly correspondent to the currently recognised two species and one unknown lineage. The PCA analyses of morphological traits indicate three distinctive groups, differing mainly in petal colour and corolla shapes. The roles of river and mountain isolations in the origin and distribution of these three lineages are discussed.


Introduction
Hainan Island is the largest tropical island in China, with an area of 33,920 km 2 . As a biodiversity hotspot in the world (Myers et al. 2000), Hainan Island has a species-rich and remarkable endemic flora (Francisco-Ortega et al. 2010), which is remarkably richer in endemic genus than Taiwan Island (36,193 km 2 ) and contains almost twice the number of Gesneriaceae species than Sri Lanka (65,610 km 2 ), which is twice the size of Hainan Island (Ranasinghe et al. 2019). The richest biodiversity is concentrated in the southcentral mountains of the island (Li and Wang 2005;Xing 2012;Yang 2013), such as Mt. Wuzhi (the highest peak with 1867 m) and Mt. Yingge (1812 m). Xing (2012) and Ling et al. (2017) identified 14 genera and 23 species of Gesneriaceae on Hainan Island, amongst which two genera (Metapetrocosmea W.T. Wang and Cathayanthe Chun) and eight species (including one variety) are endemic (Ling et al. 2017;Jiang et al. 2017).
Interestingly, there are three recognised taxa of Oreocharis Bentham on Hainan Island (O. flavida Merrill, O. dasyantha Chun and O. dasyantha Chun var. ferruginosa Pan) and all are endemic to the island (Wei 2010;Xing 2012;Yang 2013;Ling et al. 2017). During three years' observation, we found these Oreocharis taxa to possess diverse floral syndromes in a single currently recognised species and mixed distribution of different species (Wei 2010;Xing 2012;Yang 2013), together with considerable genetic differentiations amongst populations (Xing et al. 2018). The presence of a variety, i.e. O. dasyantha var. ferruginosa, further complicates the taxonomy classification of Hainan Oreocharis.
Here, we sampled 12 populations of Oreocharis taxa covering its entire distribution range on Hainan Island and examined their molecular phylogenetic relationships with one nuclear DNA fragment and two combined chloroplast DNA sequences separately. We also quantitatively analysed 16 morphological traits with principal component analysis (PCA). We aim to determine (1) whether or not the currently recognised three species or variety can be supported by genetic data (2) what factors (e.g. geographic isolation, pollination isolation, climate or intrinsic traits) are responsible for the evolution and maintenance of these Hainan-endemic Oreocharis?  Fig.1). Fresh leaves were collected from the south-central mountains in Hainan Island in 2015, 2016 and 2017 and dry stored in silica gel. In total, 238 leaf samples from 12 populations that represented the whole geographical range of Oreocharis taxa on Hainan Island were collected (Fig.1).

Materials collection and DNA extraction
Total genomic DNA for each individual was extracted using CTAB methods (Doyle and Doyle 1987) and served as a template for the polymerase chain reaction. AL2000 DNA marker (Aidlab Biotechnologies Co. Ltd) was used to detect DNA quality and quantity on 0.8% agarose gels stained with 2.5 μl Goldview (Aidlab Biotechnologies Co. Ltd) in DTU-48 spectrophotometer (Hangzhou Miu Instruments Co. Ltd, China).

PCR amplification and sequencing
One nuclear ribosomal DNA (nrDNA) sequence, the ITS region comprising spacer 1, the 5.8S gene and spacer 2 (White et al. 1990) and two chloroplast DNA (cpDNA) intron-spacer region trnL-trnF (Taberlet et al. 1991) and ycf1b (Dong et al. 2015) were used in this study (Table 2). PCR reactions were set up in a volume of 25 μl consisting of 20 μl ddH 2 O, 2.5 μl 10×Buffer, 0.5 μl dNTPs (10 mM), 0.5 μl each 5 μM primer, 0.5 μl DNA template and 0.5 μl 5 U/μl Taq polymerase (Aidlab Biotechnologies Co. Ltd). PCR was conducted in a 2720 Thermal cycler (Applied Biosystems by Life Technologies, made in Singapore) and Veriti 96-Well Thermal Cycler (Applied Biosystems by Life Technologies, made in Singapore). The PCR programme for nrDNA and trnL-trnF was designed for an initial denaturation at 94 °C 5 min, followed by 35 cycles of 1 min at 94 °C, 1 min at 55 °C, 1 min at 72 °C and with a final extension of 10 min at 72 °C. Amplification of ycf1b used the following protocol: 4 min at 94 °C, 35 cycles of 30 s at 94 °C, 40 s at 58 °C and 1 min at 72 °C, ending with 10 min at 72 °C. All the PCR products were checked by electrophoresis. Then purification and sequencing of PCR products were finally sequenced by an ABI 3730 DNA Analyzer based on the BigDye Terminator Cycle Sequencing Ready Kit (Applied Biosystems, Foster City, CA) in BGI (Beijing Genomics Institution), the chemistry and primers being used above in BGI.

The systematic position of Oreocharis taxa in Hainan Island
In order to explore the systematic position of Oreocharis taxa in Hainan Island, we followed Möller et al. (2011a, b) and Chen et al. (2014) and used 57 other Oreocharis species with suitable DNA sequences in the study. Finally, a total of 60 species were included in the phylogenetic analysis. We manually aligned all sequences using MEGA v.6.5 (Kumar et al. 2008) and excluded ambiguous positions from the alignments. The two no-coding gene ITS1/2 and trnL-trnF were concatenated to a single matrix by the programme SequenceMatrix v.1.7.8 (Vaidya et al. 2011) after a congruency test by PAUP* 4.0a164 (Swofford 2003). We inferred the optimal model of nucleotide substitution by MR-MODELTEST 2.3 (Nylander 2004), based on the AIC (Akaike Information Criteria) (Akaike 1981). In addition, the most suitable model GTR+I+G was used in both ML and BI analysis. Maximum Likelihood (ML) analysis was conducted using MEGA v.6.5 (Kumar et al. 2008) with the optimal substitution models to carry out 1000 bootstrap (BS) replicates. Bayesian Inference (BI) analysis was conducted using MrBayes version 3.1.2 (Huelsenbeck and Ronquist 2001). The Markov Chain Monte Carlo (MCMC) was analysed for 10 million generations and sampling every 10000 generations for two independent Bayesian runs. The first 2500 trees (25% of total trees) were discarded as burn-in and the remaining trees were summarised in a 50% majority-rule consensus tree with the posterior probabilities (PP). The mean and posterior of each branch were visualised by FIGTREE v.1.4.2 (Rambaut 2009). Sequences used are showed in Appendix 2.

Genetic diversity and differentiation
The original chromatograms from both directions of the ITS and cpDNA sequences obtained were evaluated with the software BioEdit (Hall 1999) for base confirmation and contiguous sequences editing, then sequences were manually aligned, where necessary, using MEGA v.6.5 (Kumar et al. 2008) and ambiguous positions were excluded from the alignments. All sequences have been deposited in GenBank (MK587942-MK588003). Subsequently, we combined the two no-coding cpDNA regions as a single locus by the programme SequenceMatrix v.1.7.8 (Vaidya et al. 2011). Then, we performed a Partition Homogeneity Test based on the combined cpDNA and an Incongruence Length Difference Test, based on combined ITS and cpDNA using PAUP* v. 4.0a164 (Swofford 2003).
The number of nucleotypes/haplotypes, number of nucleotypes/polymorphic sites (S), nucleotype/haplotype diversity (h), nucleotide diversity (π) and measures of DNA divergence (K) values were analysed by the programme DNASP v. 6.12.01 (Rozas et al. 2017) for each population and Fu's Fs (Fu 1997) and Tajima's D (Tajima 1989) values were tested for vital deviations from the null hypothesis of neutral evolution and constant population size, based on the ITS and cpDNA sequences separately. We generated the geographical distribution of nucleotypes/haplotypes according to sampling information (Table 1).
Genetic diversity within populations (Hs; Nei 1973), in total populations (H T ), total gene diversity index (N ST ) and genetic differentiation index within populations (G ST ) were measured using Haplonst (Pons and Petit 1996) and G ST and N ST compared by the U test (Pons and Petit 1996) based on the ITS and cpDNA sequences separately.
The Analysis of Molecular Variance (AMOVA) was conducted to estimate genetic variation which was assigned within and amongst populations using GENALEX v. 6.503 (Peakall and Smouse 2012), based on the ITS and cpDNA sequences separately.

Phylogenetic relationships
Phylogenetic relationships of nucleotypes/haplotypes were inferred with BI using Mr-Bayes v. 3.2.6 (Ronquist et al. 2012). According to test above, O. sinohenryi, which had the closest phylogenetic relationships with the Hainan Oreocharis taxa, were used as outgroups with sequences of nrDNA.
Prior to Bayesian analysis, the optimal model of nucleotide substitution was detected for each gene using MRMODELTEST v. 2.3 (Nylander 2004), based on the AIC (Akaike 1981). Two independent Bayesian runs of MCMC were performed for 10 million generations, sampling every 10000 generations. We accessed the Chain convergence in Tracer v. 1.7.1 (Rambaut and Drummond 2007) by checking the effective sample size (ESS) that was larger than 200 for each parameter. To further explore the relationships amongst unique nucleotypes, genealogical relationships were inferred from Median-Joining network (MJ) of NETWORK v. 5.001 (http://www.fluxus-Engineering.com/).

Neighbour-joining (NJ) tree and structure
All sequences of each population were chosen to represent effective geographic populations themselves. The method for the Neighbour-joining (NJ) tree was selected to build the phylogenetic relationship of Oreocharis taxa populations in Hainan Island by MEGA v.6.5, with Kimura two-parameter model (Kimura 1980), based on the ITS and two combined cpDNA sequences separately.
A Bayesian clustering approach conducted in STRUCTURE v. 2.3.4 (Earl and Vonholdt 2012) was used to detect the population genetic structure of the Hainan Oreocharis taxa, based on ITS and combined cpDNA sequences separately. The number of possible clusters (K) was set from 1 to 10 and each K run 10 times. Each run comprised a burn-in period of 1 × 10 5 interactions with 1 × 10 5 MCMC steps after burning. The most suitable value of K was determined from Structure Harvester (Pritchard et al. 2000; http://taylor0. biology.ucla.edu/structureHarvester/) by using ΔK and the log-likelihood value. Finally, the result from programme STUCTURE for the best K value was drawn in CLUMPAK server (Jakobsson and Rosenberg 2007; http://clumpak.tau.ac.il/index.html).

Isolation by distance (IBD)
To detect whether there was local genetic variation under geographically limited dispersal, isolation by distance (IBD) for each population was tested by a Mantel test in GENALEX between pairwise genetic distance (uncorrected sequence divergence (Dxy) for nuclear DNA and cpDNA) and geographical distance.
Five stamen traits were included in the analyses: (i) anther position (includedanthers hidden inside the floral tube, floral throat-anthers lying in the throat of floral tube, exerted-anthers are exposed outside the floral tube), (ii) stamen type (monomorphic, didynamous), (iii) pollen presentation (simultaneous, separately for each pair), (iv) anther shape (oval, horseshoe) and (v) hair on filament (absent, present).
Two stigma characters and two leaf traits were also included in the analyses: (i) location of stigma (included-stigma hidden inside the floral tube, throat-stigma lying in the throat of floral tube, exerted-stigma is exposed outside the floral tube), (ii) number of stigma (one, two), (iii) serration of leaf edge (present, absent) and (iv) leaf epidermal hair in abaxial side (absent, present). Measurements were taken with a rectilinear scale and rounded to the nearest 0.1 mm.
Principal Component Analysis (PCA) was conducted in SPSS v. 19.0 (Liu and Li 2014) to determine the traits with the highest value for classification and the plotting map.

Monophyly of the Hainan Oreocharis taxa
The combined ITS1/2 and trnL-F datasets of Hainan Oreocharis taxa with other 57 Oreocharis species were 568 and 871 bp long, amongst which 233 and 89 were polymorphic sites and 141 and 38 were parsimony informative sites, respectively. The aligned dataset was 1439 bp long with a total number of 305 polymorphic sites measured, of which 160 were parsimony informative sites. There was no significant incongruence, based on the incongruence length difference (ILD) test between the ITS1/2 and trnL-F (p > 0.05).
Both the BI and ML analysis showed Hainan Oreocharis taxa being monophyly with PP (posterior probability) = 0.79 and BS (bootstrap value) = 38% (Appendix 1). In addition, O. sinohenryi, whose regions are restricted in South China (Guangxi and Guangdong Province), is the sister to Hainan Oroecharis taxa in the current tree with relatively high support (Appendix 1).

Genetic diversity and differentiation
The aligned ITS sequence matrix comprised in total of 670 basepairs (bp). A total number of 56 polymorphic sites were present, of which 48 were parsimony-informative, which allowed the identification of 16 different nucleotypes from a size of 238 samples (Table 1, Fig.1a). Four nucleotypes, H1, H2, H8 and H12, were shared amongst populations, the other 12 nucleotypes were private, i.e. present in only one population (Table 1, Fig. 1a).
The combined alignment of the two cpDNA regions was in total 1615 bp long (858 and 757 bp for trnL-trnF and ycf1b, respectively) with a significant rate of homogeneity  (P = 1) in the congruency test, indicating that there was no significant difference in the laboratory between the two cpDNA regions. The alignment contained 55 polymorphic sites and 8 indels (Table 2). A total of 23 chloroplast haplotypes were present amongst the 238 samples (Fig. 1b, Table 1). Of these, only two haplotypes, H2 and H20, were shared amongst several populations, whereas the other 21 haplotypes were private (Table 1, Fig. 1b). The combined dataset, based on ITS and cpDNA as pairwise ILD tests, showed that the two DNA regions were not significantly different from each other (P > 0.05).
Haplotypes diversity (Hd) and nucleotide diversity (Pi) for each population are summarized in Table 1 and there is little difference between nrDNA and cpDNA. Generally, except for population WZB, YG and LM presented high genetic diversity in both nrDNA and cpDNA, nuclear gene ITS in population CH and chloroplast gene trnL-F and ycf1b in population DE, NG and HM showed variable genetic diversity, the rest of the populations having very low nucleotide and haplotype diversity (Table 1).
In total, the average intrapopulation diversity H S was lower than the genetic diversity H T. Both in ITS and cpDNA sequences, total gene diversity index (N ST ) was not significantly greater than the genetic differentiation index within populations (G ST , P > 0.05), revealing that Hainan Oreocharis taxa have no correspondence between haplotype comparability and geographic distribution (Appendix 3).
The AMOVA indicated that almost all variation (99% and 97%) was partitioned amongst populations, which was higher than the variation (1% and 3%) within populations, based on the ITS and cpDNA data, respectively, revealing highly significant genetic differentiation amongst populations (Table 3).

Phylogenetic relationship
Both phylogenetic trees, based on ITS nucleotypes and cpDNA haplotypes, indicated that nucleotypes/haplotypes can be separated into three main groups with strong bayesian probabilities (> 0.95) (Figs 2, 3). The nucleotype/haplotype network of nuclear DNA and cp-DNA was concordant with the phylogenetic relationship, which presented three centrally located nodes, representing possible ancestral haplotypes with a high frequency (Figs 2b,  3b). The rest of the haplotypes were connected to the central haplotypes by one to four steps in a star-like network. In the network of nuclear DNA, nucleotypes H1 and H8 occurred at the highest frequency, indicating they probably are ancestral nucleotypes of O. dasyantha. In the network of cpDNA, haplotypes H12 may be the ancestral haplotypes of O. flavida since it was at the centrally located nodes with highest occupied frequency.

Neighbour-joining (NJ) tree and Population structure
The results of the NJ tree, based on nrDNA and cpDNA, suggested 12 populations were clearly clustered into three major groups, which well corresponded to the three defined Oreocharis taxa in Hainan Island, i.e. O. dasyantha (includes O. dasyantha var. ferruginosa),  O. flavida and Oreocharis sp. Additionally, the analyses also presented a close relationship between O. flavida and Oreocharis sp., then with O. dasyantha. Although the signal was stronger for cpDNA (Rxy = 0.473, P < 0.001) than for nuclear DNA (Rxy = 0.257, P < 0.001), the relationship between genetic and geographical distance for 12 populations was significant both in nuclear DNA and cpDNA (Appendix 3).

Ordinations of morphological traits
According to the floral syndromes, the Principal Component Analysis of 16 floral characters of Hainan Oreocharis populations can be divided into three clusters (Fig. 5): (1) tubular, zygomorphic flowers with yellow tube but orange limbs, monomorphic stamens, pol-

Monophyly of the Hainan Oreocharis taxa
The phylogenetic tree showed that Hainan Oreocharis taxa are monophyly (Appendix 1), suggesting a single dispersal of Oreocharis into Hainan Island. The sister species to Hainan Oreocharis is O. sinohenryi, which is restricted to South China including Guangxi and Guangdong provinces. Hainan Island is only about 30 km from these provinces, thus such observed pattern can be simply explained by geographic relationships.

Genetic diversity and structure
Most Oreocharis populations hold very low nucleotide and haplotype diversity (Table  1) and overall populations revealed a high level of genetic differentiation (Table 3)  The unknown species is restricted in the upper reaches of the largest river on the island, i.e. Nandu River. We concluded that these three groups may have evolved and maintained largely through allopatric differentiations.
Mountains can also probably explain such observed pattern with geographic isolation of these groups. Almost all Oreocharis populations in Hainan Island were restricted in > 1000 m high-elevation mountains with massive humidity, such that the islandlike habitat became fragmented caused by a deep and wide valley in the complicated mountains system, which resulted in blocking of gene flow of Oreocharis populations with weak seed dispersal ability even at the fine scale (Xing et al. 2018). Li et al. (2019) found that geographic isolation by Changhua River is a driving force for the strong population differentiation in the Hainan-endemic Primulina heterotricha Merr. and Metapetrocosmea peltata (Merr. et Chun) W. T. Wang. Our results can also be explained by the isolation of Changhua River (Figs 1, 4), which indicated that Changhua River may play a key role in driving population divergence and speciation of the Hainan Oreocharis taxa (Xing et al. 2018;Li et al. 2019).

Genetic differentiation and species delimitation
Secondly, the 'sky island' caused by high mountains may also cause such genetic differentiation for montane species (Palma-Silva et al. 2011;Tapper et al. 2014;Robin et al. 2015). Mountain tops in Hainan Island have tropical mountain cloud forests (> 1200 m) in Mt. Wuzhi, Mt. Yingge, Mt. Bawang, Mt. Jianfeng and Mt. Limu (Wang et al. 2016) which fragmented and restricted the island-like habitat of Oreocharis taxa. Alpine plant radiations have accelerated speciation with trait diversification (Sanderson 1998;Colin and Ruth 2006;Hughes and Atchison 2015) and, in general, these radiations are geared to be recent and rapid (Linder 2008). Almost all Oreocharis taxa in Hainan Island lived in high mountains except Population CH, which grew in a low-altitude habitat and held a distinct structure from other high-altitude populations of O. dasyantha, indicating the sky-island effect may drive population divergence and speciation. According to morphological traits, all the 12 Oreocharis populations were also grouped into three clusters and corolla colour, shape and types are the main characters for distinguishing groups (Fig. 5; Ling et al. 2017). Such differences in floral